home *** CD-ROM | disk | FTP | other *** search
- set output error
- set display page 23
- set more 1
- #delimit ;
-
- di _n(5) in wh
- " ___ ____ ____ ____ ____ tm" _n
- " /__ / ____/ / ____/" _n
- "___/ / /___/ / /___/ Survival Analysis" _n
- "-------------------------------------------------" _n(2) ;
-
-
- di in gr
- "This tutorial provides an overview of the Stata commands for performing" _n
- "survival analysis, including the estimation of Cox proportional-hazards models"
- _n "via maximum-likelihood techniques. The commands we will discuss are:" _n(2)
- in wh
- _col(18) "cox coxvar loglogs survcurv" _n
- _col(18) "coxbase gwood logrank survsum" _n
- _col(18) "coxhaz kapmeier mantel wilcoxon" _n ;
-
- di in gr
- "We will draw some graphs during this tutorial. If your computer and monitor"
- _n
- "do not have graphics capabilities, you won't be able to see them, and that's"
- _n
- "too bad, but nothing will go wrong." _n(2)
- "If you are using a Sun computer with SunView, we recommend that before run-"
- _n
- "ning this tutorial, you type '"
- in wh "window define tut" in gr "'. If you typed that when" _n
- "you ran the graphics tutorial, you needn't type it again. If you haven't" _n
- "yet typed it, break out of this tutorial by holding down "
- in wh "Ctrl" in gr " and pressing" _n
- in wh "C" in gr ", type the command, and then restart this tutorial." _n ;
-
- #delimit cr
- mac def path
- capture run nullfile.tut
- if _rc {
- mac def path "\stata\"
- capture run %path`nullfile.tut
- if _rc {
- mac def path "/usr/stata/"
- capture run %path`nullfile.tut
- if _rc {
- #delimit ;
- di in red
- "I cannot find the other tutorial files. I have looked in the current" _n
- "directory and in \stata (DOS) or /usr/stata (Unix). Is Stata installed" _n
- "correctly?" _n(2)
- "In any case, I cannot run the tutorial." ;
- #delimit cr
- exit
- }
- }
- }
- macro define F5 "do %path`contents.tut;"
- macro define F6 "do %path`survive.tut;"
-
- #delimit ;
- set more 0 ; more ; set more 1 ;
-
-
- di _n(2) in wh _dup(79) "-" _n in gr
- "We begin with the " in wh "cox" in gr
- " command. " in wh "cox" in gr " can estimate proportional-hazards models" _n
- "on duration data, on censored duration data, and with time-varying regres-" _n
- "sors. We have some data from an experiment comparing generators with old-" _n
- "and new-style bearings. Generators were run under various overload condi-" _n
- "tions until they failed and the time, in hours, was recorded. We'll show" _n
- "you our data:"_n
- in wh _dup(79) "-" _n(4)
- ". use %path`kva, clear" ;
- noisily use %path`kva, clear ;
- di _n in wh ". describe" ;
- noisily describe ;
- set more 0 ; more ; set more 1 ;
- di _n in wh ". list" ;
- noisily list ;
- set more 0 ; more ; set more 1 ;
-
- di _n(2) in wh _dup(79) "-" _n in gr
- "All of Stata's estimation commands share the same syntax:" _n(2)
- _col(13) "<estimation-command> lhsvar rhsvar1 rhsvar2 ..." _n(2)
- "In the case of "
- in wh "cox" in gr ", the left-hand-side variable is the time-until-failure" _n
- "variable, so the syntax is:" _n(2)
- _col(13) in wh "cox "
- in gr "time-var rhsvar1 rhsvar2 ..." _n(2)
- "We wish to estimate a model of 'failtime' on 'load' and 'bearings':" _n
- in wh _dup(79) "-" _n(12)
- ". cox failtime load bearings" ;
- set more 0 ; more ; set more 1 ;
- noisily cox failtime load bearings ;
- di _n ; set more 0 ; more ; set more 1 ;
-
- di _n(2) in wh _dup(79) "-" _n in gr
- "Just as "
- in wh "cox" in gr
- " shares the same syntax with Stata's other estimation commands," _n
- in wh "cox" in gr
- " shares the same capabilities. After estimating the model, for instance,"
- _n
- "we can view the inverse information matrix either as a correlation matrix or"
- _n
- "a covariance matrix:" _n
- in wh _dup(79) "-" _n(2)
- ". correlate, _coef" ;
- noisily correlate, _coef ;
- di _n in wh ". correlate, _coef covariance" ;
- noisily correlate, _coef cov;
- di _n(2) ; set more 0 ; more ; set more 1 ;
-
- di _n(2) in wh _dup(79) "-" _n in gr
- "We can review the estimates by typing '"
- in wh "cox" in gr "' without arguments:" _n
- in wh _dup(79) "-" _n(6)
- ". cox" ;
- noisily cox ;
- di _n ; set more 0 ; more ; set more 1 ;
-
- di _n(2) in wh _dup(79) "-" _n
- "cox" in gr
- " can also estimate models with censored observations. We have another" _n
- "data set recording the results of a drug trial." _n
- in wh _dup(79) "-" _n(8)
- ". use %path`cancer, clear" ;
- noisily use %path`cancer, clear ;
- di _n in wh ". describe" ;
- noisily describe ;
- set more 0 ; more ; set more 1 ;
-
-
- di _n(2) in wh _dup(79) "-" _n in gr
- "In this case, some of the observations are censored -- not all the patients"
- _n
- "have yet died. Our time variable, studytim, records the time of death or,"
- _n
- "if the patient still lives, the time of the end of the experiment. Another"
- _n
- "variable, died, records whether the patient died." _n
- in wh _dup(79) "-" _n ;
- set more 0 ; more ; set more 1 ;
-
- di _n(2) in wh _dup(79) "-" _n in gr
- "Before launching into estimation, let's examine our data:" _n
- in wh _dup(79) "-" _n(3)
- ". tabulate died, summarize(studytim)" ;
- noisily tabulate died, summ(studytim) ;
- di _n(2) in wh _dup(79) "-" _n in gr
- "Some 17 of our 48 patients are still alive and, not surprisingly, their" _n
- "survival times are longer than those who died." _n
- in wh _dup(79) "-" _n(2) ;
- set more 0 ; more ; set more 1 ;
-
- di _n(4) in wh ". tabulate drug, summarize(studytim), if died" ;
- noisily tabulate drug, summ(studytim), if died ;
- di _n(2) in wh _dup(79) "-" _n in gr
- "Among the patients who died, and whose survival times are therefore complete,"
- _n
- "we see evidence that drug 2 is better than drug 1 and that drug 3 is better" _n
- "than drug 2." _n
- in wh _dup(79) "-" _n(3) ;
- set more 0 ; more ; set more 1 ;
-
- di _n in wh ". tabulate drug, summarize(died)" ;
- noisily tab drug, summ(died) ;
- di _n(3) in wh _dup(79) "-" _n in gr
- "Looking at the fraction of patients who died, we see that drug 1 looks re-" _n
- "markably ineffective. Thus, both the completed survival times and fraction"
- _n
- "of patients who died seem to be telling the same story." _n
- in wh _dup(79) "-" _n(3) ;
- set more 0 ; more ; set more 1 ;
-
- di _n(2) in wh _dup(79) "-" _n in gr
- "We wish to estimate a model of the form:" _n(2)
- _col(8) " Probability of dying between times t and t+dt" _n
- _col(8) "h(t) = ---------------------------------------------" _n
- _col(8) " (dt) (Probability of dying after time t)" _n(2)
- _col(8) " = h (t) exp( b age + b drug + b drug )" _n
- _col(8) " 0 1 2 2 3 3" _n(2)
- "To account for the censoring, we add the "
- in wh "dead()" in gr " option to the end of the " in wh "cox" _n in gr
- "command. "
- in wh "dead()" in gr " alerts " in wh "cox" in gr
- " that there is censoring and provides the name" _n
- "of the variable that indicates whether the patient is dead or alive. So, we"
- _n
- "type:" _n ;
-
- di in wh _col(8) "cox studytim age drug2 drug3, dead(died)" _n(2)
- in gr
- "This would be fine, except that we don't yet have indicator variables for the"
- _n
- "drug=2 and drug=3 groups. Making indicator variables from categorical vari-"
- _n
- "ables is easy in Stata:" _n
- in wh _dup(79) "-" _n(2)
- ". quietly tabulate drug, gen(drug)" ;
- tabulate drug, gen(drug) ;
- set more 0 ; more ; set more 1 ;
- di _n in wh ". describe" ;
- noisily describe ;
- di _n in wh ". cox studytim age drug2 drug3, dead(died)" ;
- set more 0 ; more ; set more 1 ;
- noisily cox studytim age drug2 drug3, dead(died) ;
- di _n ;
- set more 0 ; more ; set more 1 ;
-
- di _n(2) in wh _dup(79) "-" _n in gr
- "We find that both drug 2 and drug 3 are statistically different from drug 1."
- _n
- "We might now ask whether drug 2 is different from drug 3:" _n
- in wh _dup(79) "-" _n(3)
- ". test drug2=drug3" ;
- noisily test drug2=drug3 ;
- di _n in wh _dup(79) "-" _n in gr
- "You can learn more about the "
- in wh "test" in gr " command in regress.tut and probit.tut." _n
- "Everything said there applies with equal force after "
- in wh "cox" in gr " estimation." _n
- in wh _dup(79) "-" _n(3) ;
- set more 0 ; more ; set more 1 ;
-
- di _n(2) in wh _dup(79) "-" _n
- in wh "cox" in gr
- " also has the ability to estimate models with time-varying regressors." _n
- "There is so much more we want to cover, however, that we will not illustrate"
- _n
- "that capability here. It is covered in great detail in the manual and you" _n
- "can even find an example in the on-line help file. If you are interested," _n
- "type '"
- in wh "help cox" in gr "' at the conclusion of this tutorial." _n
- in wh _dup(79) "-" _n(13) ;
- set more 0 ; more ; set more 1 ;
-
- drop drug1 drug2 drug3 ;
-
- di _n(2) in wh
- "Survive.Kit" _n
- "-----------" _n ;
-
- di in gr
- "Survive.Kit adds the following commands to Stata:" _n(2)
- _col(8) in wh "kapmeier" in gr _col(20) "Kaplan-Meier survival curves" _n
- _col(8) in wh "gwood" in gr _col(20)
- "Kaplan Meier survival curves with Greenwood confidence bands" _n
- _col(8) in wh "loglogs" in gr _col(20)
- "diagnostic plots for the exponential & Weibull distributions" _n(2)
- _col(8) in wh "survsum" in gr _col(20) "display survival summary statistics" _n
- _col(8) in wh "survcurv" in gr _col(20) "create survival variables" _n ;
- di in gr
- _col(8) in wh "logrank" in gr _col(20)
- "log-rank test for two or more groups" _n
- _col(8) in wh "mantel" in gr _col(20) "Mantel-Haenszel test for two groups" _n
- _col(8) in wh "wilcoxon" in gr _col(20)
- "Wilcoxon-Gehan test for two groups" _n(2)
- _col(8) in wh "coxhaz" in gr _col(20) "graph the baseline hazard after "
- in wh "cox" in gr _n
- _col(8) in wh "coxbase" in gr _col(20)
- "graph the baseline survival curve after "
- in wh "cox" in gr _n
- _col(8) in wh "coxvar" in gr _col(20)
- "create baseline survival variables after "
- in wh "cox" in gr _n ;
-
- di in gr
- "Before we can continue, we must load Survive.Kit:" _n(2)
- in wh ". run %path`Survive.Kit" ;
- set more 0 ; more ; set more 1 ;
- run %path`Survive.Kit ;
-
- di _n in wh _dup(79) "-" _n in gr
- "Unfortunately, if you are running the demonstration version of Stata, we could"
- _n
- "not provide you with the full version of Survive.Kit. We had to omit the" _n
- in wh "gwood" in gr ", "
- in wh "kapmeier" in gr ", "
- in wh "logrank" in gr ", "
- in wh "mantel" in gr ", "
- in wh "survsum" in gr ", and "
- in wh "wilcoxon" in gr " commands. These" _n
- "commands need to write temporary data sets in the process of making their" _n
- "calculations, and the demonstration version of Stata lacks that capability." _n
- "In the demonstration version, these commands are merely stubs that say " _n
- "'"
- in ye "not available in demonstration version" in gr
- "' if you try to use them." _n
- in wh _dup(79) "-" _n(2) ;
- set more 0 ; more ; set more 1 ;
-
- di _n(2) in wh _dup(79) "-" _n in gr
- "By the same token, if you are using the Unix version of Stata in demonstration"
- _n
- "mode, these commands will not work. They are not, however, stubs that do" _n
- "nothing; they are the real version and will fail." _n(2)
- "To get around these problems in this tutorial, we will simulate their output,"
- _n
- "making it look as if they ran fine." _n
- in wh _dup(79) "-" _n(3) ;
- set more 0 ; more ; set more 1 ;
-
-
- di _n(2) in white
- "Command Syntax" _n
- "--------------" _n ;
-
- di in gr
- "All the Survive.Kit commands have essentially the same syntax:" _n(2)
- _col(16) "<command> timevar deadvar [" in wh ", by("
- in gr "groupvar" in wh ")" in gr "]" _n(2)
- "timevar is the name of the variable recording survival time or time until" _n
- "censoring; deadvar is the name of the variable recording 1 if the observation"
- _n
- "represents a death and 0 if it represents the last time the patient was known"
- _n
- "to be alive. Not all commands allow the "
- in wh "by()" in gr " option (because it does not make" _n
- "sense) and a few require it (for the same reason). Some commands allow other"
- _n
- "options as well." _n ;
-
- di in gr
- "You can type '"
- in wh "help Survive.Kit" in gr "' at the end of this tutorial for complete de-"
- _n
- "tails on syntax." _n(2)
- "We are not going to give examples of all the Survive.Kit commands -- there"
- _n
- "are too many! But we will show a few of Survive.Kit's capabilities." _n(4) ;
- set more 0 ; more ; set more 1 ;
-
-
- di _n(13) in wh _dup(79) "-" _n in gr
- "The "
- in wh "kapmeier" in gr
- " command graphs the Kaplan-Meier survival curve. Let's take a" _n
- "look using our drug trial data. (If your computer does not have graphics" _n
- "capabilities, these commands won't do anything.)" _n
- in wh _dup(79) "-" _n(2)
- ". kapmeier studytim died" ;
- set more 0 ; more ;
- graph using %path`kapm ;
- set more 1 ;
- di in wh _n ". kapmeier studytim died, by(drug)" ;
- set more 0 ; more ;
- graph using %path`kapm2 ;
- set more 1 ;
-
- di _n(2) in wh _dup(79) "-" _n in gr
- "Adding Greenwood confidence bands is easy:" _n
- in wh _dup(79) "-" _n(2)
- ". gwood studytim died" ;
- set more 0 ; more ;
- graph using %path`gw ;
- set more 1 ;
-
- di _n in wh
- ". gwood studytim died, by(drug)" ;
- set more 0 ; more ;
- graph using %path`gw2 ;
- set more 1 ;
-
- di _n(4) in wh _dup(79) "-" _n in gr
- "The " in wh "survsum" in gr
- " command displays summary statistics -- number of observations," _n
- "number of deaths, median survival time, mean survival time -- for the sample."
- _n
- "The estimated median survival is based on the nonparametric Kaplan-Meier" _n
- "survival curve. The mean survival is based on the assumption of an exponen-"
- _n
- "tial distribution that takes censoring into account." _n
- in wh _dup(79) "-" _n(2)
- " . survsum studytim died" ;
- di _n in gr
- "Group Cases Died Median Mean" _n
- "-------------------------------------------------------" _n
- in ye "Total 48 31 17 24" ;
- di _n in wh ". survsum studytim died, by(drug)" ;
- di _n in gr
- "Group Cases Died Median Mean" _n
- "-------------------------------------------------------" _n
- in ye
- "1 20 19 8 9.4736843" _n
- "2 14 6 22 34.833332" _n
- "3 14 6 33 59.166668" _n
- "Total 48 31 . ." _n ;
- set more 0 ; more ; set more 1 ;
-
- di _n(13) in wh _dup(79) "-" _n in gr
- "The "
- in wh "loglogs" in gr
- " command produces a graph of log(-log(S(t))) vs. log(t), where" _n
- "S(t) is the survivor function defined by the Kaplan-Meier product-limit" _n
- "estimate." _n(2)
- "If the resulting curve is a straight line, the data may be assumed to come" _n
- "from a Weibull distribution." _n
- in wh _dup(79) "-" _n(2)
- ". loglogs studytim died" ;
- set more 0 ; more ;
- noisily loglogs studytim died, ylabel xlabel ;
- set more 1 ;
-
- di _n in wh ". loglogs studytim died, by(drug)" ;
- set more 0 ; more ;
- noisily loglogs studytim died, by(drug) bsize(125) ylabel xlabel border ;
- set more 1 ;
-
- di _n(7) in wh _dup(79) "-" _n in gr
- "The "
- in wh "survcurv" in gr " command adds three new variables to your data:" _n(2)
- in wh _col(8) "_surv" _col(20) in gr
- "the Kaplan-Meier product-limit estimates" _n
- _col(8) in wh "_stds" _col(20) in gr
- "the Greenwood standard deviation of the survival curve" _n
- in wh _col(8) "_vlogs" in gr
- _col(20) "the variance of the log survival" _n(2)
- "Other programs in Survive.Kit use "
- in wh "survcurv" in gr " as a utility routine and you may" _n
- "find it useful when you wish to design custom graphs or make custom estimates."
- ;
-
- di in gr
- "For instance, for the Weibull distribution, we know that" _n(2)
- _col(8) "log(-log(S(t))) = -log(lambda) + p log(t)" _n(2)
- "where S(t) is the survival function. Using the variables produced by "
- in wh "survcurv" _n in gr
- "and linear regression, we can estimate the parameters of the Weibull:" _n
- in wh _dup(79) "-" _n(2)
- ". survcurv studytim died" ;
- noisily survcurv studytim died ;
- set more 0 ; more ; set more 1 ;
- di _n in wh ". gen loglogs = log(-log(_surv))" ;
- gen loglogs = log(-log(_surv)) ;
- di _n in wh ". gen logt = log(studytim)" ;
- gen logt = log(studytim) ;
- di _n in wh ". regress loglogs logt" ;
- set more 0 ; more ; set more 1 ;
- noisily regress loglogs logt ;
- di _n ; set more 0 ; more ; set more 1 ;
-
- di _n(4) in wh _dup(79) "-" _n in gr
- "The " in wh "logrank" in gr
- " command calculates the log-rank statistic for comparing survival"
- _n
- "curves of two or more groups:" _n
- in wh _dup(79) "-" _n(2)
- ". logrank studytim died, by(drug)" ;
- di _n in gr
- "Group Events Predicted" _n
- "--------------------------------------------" _n
- in ye
- "1 19 7.2459559" _n
- "2 6 8.1984653" _n
- "3 6 15.555579" _n(2)
- in gr
- "Chi2( " in ye "2" in gr " ) = " in ye "25.526241" in gr " , P = "
- in ye "2.864e-06" _n ;
-
- di _n in wh _dup(79) "-" _n in gr
- "The results indicate that the three groups are indeed different; the P-value"
- _n
- "is less than 0.1%. Note that there are 19+6+6=31 deaths occurring (labeled" _n
- "events) and 31 deaths predicted (which is perhaps not so obvious). For the" _n
- "control group (group 1), there are 19 observed deaths, but only 7.25 would" _n
- "be expected if all patients had the same survival rate." _n
- in wh _dup(79) "-" _n ;
- set more 0 ; more ; set more 1 ;
-
- drop _all ;
- label drop _all ;
- macro define F6 "do %path`statkit.tut;" ;
-
- di _n(4) in white
- "Demonstration ends" _n
- "------------------" _n ;
-
-
- di in green
- "That concludes our short demonstration, but there's much more. We now return"
- _n
- "control to you. Some suggestions:" _n ;
-
- di in green
- "If you ..." _col(34) "Then we will show you ..." _n
- " Press " in white "F5" in green _col(38) "a table of tutorial contents" _n
- " Press " in white "F6" in green _col(38) "the next tutorial, "
- in white "statkit.tut" _n ;
-
- run %path`tobuy.tut ;